High Scalability of HDFS using Distributed Namespace

نویسندگان

Harcharan Jit Singh

V. P. Singh

چکیده

In data intensive computing, Hadoop is widely used by organizations. The client applications of Hadoop require high availability and scalability of the system. Mostly, these applications are online and their data growth rate is unpredictable. The present Hadoop relies on secondary namenode for failover which slows down the performance of the system. Hadoop system’s scalability depends on the vertical scalability of namenode server. As the namespace of Hadoop distributed file system grows, it demands additional memory to cache. A namenode server does not have enough primary memory to cache the namespace, its performance and availability effects. A new Hadoop architecture has been proposed to address the issues of namenode scalability, single point of failure and availability of Hadoop. This approach is based on distribution of namespace using distributed hash tables. The growing size of namespace of HDFS is distributed into multiple name node servers. The proposed architecture of Hadoop is simulated by using the multiple name node servers. The name node are arranges in chord ring. This allows HDFS to scale up horizontally. The system provides decartelize managed approach for namespace distribution which gives consistent performance. The results of HDFS namespace to store 1 billion or above files are discussed in this research work. The proposed architecture has shown high availability and adapts to name node failure. General Terms Data intensive computing, Scalability, Failover, Availability

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model-Based Namespace Metadata Benchmark for HDFS

Efficient namespace metadata management is increasingly important as next-generation storage systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of an appropriate namespace metadata benchmark. We describe MimesisBench, a novel namespace metadata benchmark for next-generation storage systems, and demonstrate i...

متن کامل

DNN: A Distributed NameNode Filesystem for Hadoop

The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...

متن کامل

Cross-Partition Protocols in a Distributed File Service

distributed file system, distributed namespace, fault tolerance, Storage Area Network (SAN) A number of ongoing research projects follow a partition-based approach in order to achieve high scalability for access to the distributed storage service. These systems maintain a namespace that references objects distributed across multiple locations in the system. Typically, atomic commitment protocol...

متن کامل

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ sing...

متن کامل

A Novel Approach for Improving Security and Storage Efficiency on HDFS

Distributed file system for the storage of massive files have obvious advantages compared with the conventional file system. For instance, Hadoop Distributed File System (HDFS) implemented with commodity hardware has the advantages of low cost, high fault tolerance, scalability, etc. However, HDFS has the potential safety hazard due to the unencrypted data stored in Datanode, which may cause da...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

High Scalability of HDFS using Distributed Namespace

نویسندگان

چکیده

منابع مشابه

A Model-Based Namespace Metadata Benchmark for HDFS

DNN: A Distributed NameNode Filesystem for Hadoop

Cross-Partition Protocols in a Distributed File Service

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

A Novel Approach for Improving Security and Storage Efficiency on HDFS

عنوان ژورنال:

اشتراک گذاری